2.4 聚合:最小值、最大值、其他值

本文源码请见我的GitHub

2.4 聚合:最小值、最大值、其他值

2.4.1 数组值求和

1
2
3
import numpy as np
L = np.random.random(10)
sum(L)#python内置函数sum
2.8464551359447516
1
np.sum(L)#np中的方法
2.8464551359447516

2.4.2 max & min

1
2
3
big_array = np.random.rand(1000)
%timeit sum(big_array)
%timeit np.sum(big_array)
164 µs ± 2.71 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
3.09 µs ± 41.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1
%timeit min(big_array)
68.9 µs ± 2.82 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
1
max(big_array)
0.998342996761799
1
%timeit np.min(big_array) #np中的语法明显要更快一些
3.11 µs ± 284 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
1
np.max(big_array)
0.998342996761799
1
2
'''另一种更简洁的语法是通过对象直接调用'''
print(big_array.max(),big_array.min())
0.998342996761799 0.0013296091676711086

1.多维度聚合

1
2
M = np.random.rand(3,4)
print(M)
[[0.17460472 0.8095875  0.98024377 0.58942287]
 [0.16436885 0.47376126 0.27927504 0.55330698]
 [0.1979097  0.3506765  0.48979371 0.07578097]]
1
M.min(axis=0)#每一列的最小值
array([0.16436885, 0.3506765 , 0.27927504, 0.07578097])
1
M.max(axis=1)#每一行的最大值
array([0.98024377, 0.55330698, 0.48979371])

2.4.3 demo:美国总统身高

1
import pandas as pd
1
data  = pd.read_csv("data/president_heights.csv")
1
2
heights = np.array(data['height(cm)'])
print(heights)#所有的身高
[189 170 189 163 183 171 185 168 173 183 173 173 175 178 183 193 178 173
 174 183 183 168 170 178 182 180 183 178 182 188 175 179 183 193 182 183
 177 185 188 188 182 185]
1
2
3
4
#概括统计值
print("Mean height: ",heights.mean())
print("standard deviation:",heights.std())
print("Minimum height: ",heights.min())
Mean height:  179.73809523809524
standard deviation: 6.931843442745892
Minimum height:  163
1
2
3
4
#计算分位数:
print("25th percentile: ",np.percentile(heights, 25))
print("Median: ",np.median(heights))
print("75 percentile", np.percentile(heights, 75))
25th percentile:  174.25
Median:  182.0
75 percentile 183.0
1
2
3
4
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn
seaborn.set()#设置绘画风格
1
2
3
4
plt.hist(heights)
plt.title("heights distribution of US Presidents")
plt.xlabel("heights(cm)")
plt.ylabel("number");